The Global Water Access Gap

Introduction

This blog analyzes the impact of differential access to drinking water around the world. As water is critical to our survival, water access affects nearly all parts of human life – including life expectancy, socioeconomic status, health and nutrition, and much more. With endless issues to study surrounding water access, we focus our blog on a few key interests of ours, including how water access levels relate to educational outcomes and economic development levels, as well as how social media are used as a tool for clean water advocacy.

We begin this blog with an overview of how drinking water levels and deaths due to unsafe drinking water vary at the global scale. We then analyze the differences in water access across various levels of economic development, and across urban and rural regions. Next, we look at water access in schools, and its effect on primary and secondary school enrollment rates. Finally, in keeping with our shared passion for social justice and equity, we present an analysis of the most common sentiments and words used in tweets regarding clean water advocacy.

Photo: Lys Arango for Action Against Hunger, Philippines

Data

Dataset 1: Drinking Water Access Worldwide by Households

  • Link: https://washdata.org/data/household#!/

  • Description: This dataset is from the World Health Organization (WHO), accessible through the JMP Global Database. The dataset contains household-level data on the coverage and service levels of water throughout the world. The JMP Database allows users to filter by residence type (urban, rural, and total) and year (2000 to 2017), as well as look at the data by country or by SDG (Sustainable Development Goal) regions. We were able to download this data in the format of a .csv file.

  • The water service level within this dataset is divided into five possible categories. In order of lowest to highest levels of access, they include Surface Water, Unimproved Service, Limited Service, Basic Service, and Safely Managed Service.

Dataset 2: Deaths Caused by Unsafe Drinking Water

  • Link: http://ghdx.healthdata.org/gbd-results-tool

  • Description: This dataset is from the Global Health Data Exchange (GHDx), accessed through the Institute for Health Metrics and Evaluation at the University of Washington. The data includes the percentage of a country’s deaths that are caused by unsafe drinking water, as well as the number of deaths caused by unsafe drinking water from 1990 through 2019. We were able to directly download a .csv file for data on all countries for the year 2017 (selected to match the most recent year of available data in Dataset 1).

Dataset 3: Drinking Water Access Worldwide in Schools

  • Link: https://washdata.org/data/school#!/

  • Description: This dataset is from the same source as Dataset 1 (WHO) and was also found through the JMP Global Database. However, rather than household-level data, this dataset focuses on how water services levels vary in schools across the world. The dataset allows users to view country-level data, as well as data by other relevant groupings including the SDG (Sustainable Development Goal) regions, which we choose to focus on, given the importance of our topic in regards to the SDGs.

  • A key difference of importance between this dataset and Dataset 1 is that the water service levels are slightly different between the two. While the household-level dataset breaks water access down into 5 levels, this dataset only includes 3, likely due to data collection constraints. The 3 groupings (in order of lowest to highest levels of water service in schools) are No Service, Limited Service, and Basic Service.

Dataset 4: Primary and Secondary School Enrollment Rates

  • Link: https://ourworldindata.org/primary-and-secondary-education#enrolment-in-primary-school

  • Description: This website presents various statistics on school attendance, completion, and enrollment around the world, using data from the World Bank. We utilized 2 datasets from this site on school enrollment rates by country, one with primary school data and one with secondary school data. The data were measured at a variety of years for each country, and we kept the most recent year of data collection for each country in our analysis.

  • It is important to note that these data are reported as gross enrollment ratios, meaning the proportion of individuals enrolled in primary or secondary school over the total eligible population for each level of schooling. Therefore, it is common for some countries to have a gross enrollment ratio above 100%, as over-aged or under-aged students at each schooling level will not be accounted for in the eligible-aged population.

Dataset 5: Water Services and Economic Development

  • Link: https://databank.worldbank.org/home.aspx;

  • Description: We used the World Bank’s databank tool to create a dataset that contains GDP per capita ($), water service level, water service type, and gini coefficient of different countries across the world from 2000 to 2007. Since the distribution of GDP per capita is right skewed, we created a loggdp variable to better visualize the data.

Other Datasets:

  • For the text analysis of tweets, Masahiro created a unique dataset using the search_tweets() function through his Twitter developer account. The tweets addressed in the dataset are those generated in the period starting from April 27th and ending on May 7th this year. The tweets are selected if they include one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.”

  • In addition to our main informational datasets, we also used the maps package for spatial visualizations and the United Nation’s SDG regional grouping dataset (https://unstats.un.org/sdgs/indicators/regional-groups) to complement the data in Dataset 5.

Worldwide Water Service Access and Deaths Caused by Unsafe Drinking Water (Alastair)

Background

How important is it to have access to safe drinking water?

In this map, we examine what percent of a country’s total deaths in the year 2017 are caused by unsafe drinking water. The map provides a broad visual overview of which countries have the highest proportion of deaths due to unsafe drinking water. From there, we are able to examine a particular country’s access to different water service levels, as well as note the total number of deaths by unsafe water.

Percentage of Deaths and Water Service Levels by Country (Map)

Limitations

Countries without data on water service access level and deaths caused by unsafe drinking water include: Taiwan, Argentina, Dominica, Palestine, Eritrea, Central African Republic, and Saint Kitts and Nevis.

The two countries not mapped are: Tokelau and Tuvalu. They are not included in leaflet’s world map, although there is data about their access to different water service levels as well as data on deaths related to unsafe drinking water for these two countries.

The year 2017 was the most recent year included in the water access dataset, so we are working under the assumption that those conditions are similar enough to the conditions in 2021 to draw meaningful analysis about the current global water access gap.

Conclusions

Countries in Central/Sub-Sarahan Africa and South/Southeast Asia appear to have the highest percentage of deaths caused by unsafe drinking water in 2017.

Although countries like Chad, Nigeria, and Madagascar all have some of the highest percentage of deaths, the number of deaths in India is most for any country, with over 500,000 deaths caused by unsafe drinking water in 2017.

Interestingly, even counties with 100% safely managed service may still have some deaths, such as New Zealand which had 14 deaths caused by unsafe drinking water in 2017.

There is a high correlation between countries that have few deaths and countries that have a high percentage of their population relying on at least basic service for drinking water. This means that most countries experiencing deaths due to unsafe drinking water have a significant portion of their population relying on either surface water, or an unimproved/limited service.

Water Service Access, Economic Development, and Inequality (Siyi)

Water service access and economic development

How do countries of different economic development levels differ in their access to water services and how has that changed over time?

Click here to view the interactive Shiny app.

Use of Data

This Shiny app focuses on two variables as indicators for economic development — GDP per capita ($) and Gini coefficient. Despite its limitations, GDP per capita, which is the economic output per person in a country, is generally regarded as an effective indicator of economic development levels. Gini coefficient measures the income inequality of a country, with a value of 0 representing perfect equality and 1 representing maximal inequality. Although they do not provide a completely comprehensive assessment of economic development, data on these two variables are comparatively commonly available. Through this Shiny app, we are interested in exploring possible associations between water service coverage, GDP per capita, and economic inequality.

Water service access and urban-rural inequality

How does access to water services differ across urban and rural regions around the world and how has that changed over time?

Click here to view the interactive Shiny app.

Limitations

  • There is a lack of data on water service coverage, GDP per capita, and Gini coefficient by country. The visualization only includes countries that have data on all of these variables in a given year; therefore, it is hard to develop comprehensive analysis, as for some year, only a few countries in an entire region have sufficient data to be plotted.
  • There is also a lack of data for regional water service coverage by residence type in general. Many regions do not have data for either urban or rural service coverage or both, and it is sometimes hard to tell what specific temporal changes look like.
  • There is additionally an imbalance in the lack of data. Some regions, such as Europe and Northern America, have significantly more data available than other regions from the global South. Both overrepresentation and underrepresentation could lead to misleading trends.

Conclusions

  • There is a global growth in water service coverage from 2000 to 2017.
  • Countries of higher economic development levels tend to have better water service coverage, and countries with very large income disparity tend to be those that rank in the middle in water service coverage.
  • There is a urban-rural gap in water service coverage worldwide, although this disparity is decreasing from 2000 to 2017. The scale of this inequality differs across regions, water service types, and time.

Water Access in Schools (Jamie)

Background

This section of our blog explores the intersection between water access and education. Much like textbooks and classrooms, water is an essential resource students need to succeed in school. A 2013 study on primary schools in Brazil even showed that access to piped water improved student test scores by 11% (Mejía, 2014)! While water fountains are found at nearly every corner of most US schools, access to water in schools varies greatly on the global scale.

For some background on water access in schools across the world, click here to view the interactive Shiny app, and read below for more details on the app.

The first tab in my Shiny app highlights the differences in water service levels in schools among the 8 SDG (Sustainable Development Goal) regions in 2018, the most recent year of data collection from the WHO. These differences can be viewed among all schools in a region, or by only primary schools or secondary schools. Basic service, the highest ranking of service level, refers to having a safe water supply available on school premises. Limited water refers to having a safe water supply nearby (up to a 30-minute walk round trip), but not directly on school premises. Lastly, no service refers to having only an unsafe water source (e.g. surface water, unprotected spring).

The second tab in the app addresses the question as to how specifically basic water services in schools have changed over the last decade in SDG regions. This will be important context to keep in mind, as the following section will analyze a more specific research question regarding the impact of basic water services in schools on educational outcomes.

Impact on Enrollment

Now that we’ve looked at how water service levels in schools vary across the world, this section takes that analysis a step further and incorporates school enrollment data. Specifically, I was interested in the following research question:

Does differential access to basic drinking water in schools impact primary and secondary school enrollment rates?

I answer this question in my second interactive Shiny app. As a reminder, basic water service refers to schools that have a safe water supply available on school premises.

Conclusions

Based on the Shiny app in my background section, it appears that water access appears to be slighter better in secondary schools versus primary schools across all SDG regions, with the exception of Latin America and the Caribbean. In looking at changes over the last decade in the proportion of schools providing a basic water supply across the SDG regions, it appears that Northern Africa and Western Africa and Sub-Saharan Africa are the only regions that experience some fluctuations (although still quite minimal). By looking at the same data by primary and secondary schools, it can be concluded that these fluctuations occur through changes at the secondary schooling level, as primary school rates stay relatively constant throughout the decided. This is not all that surprising, given that primary school is compulsory in most countries, whereas secondary schooling, particularly in developing countries is often undergoing significant policy and resource changes.

Based on my second Shiny app that incorporated school enrollment rates, there appears to be no relationship between providing a basic water supply and school enrollment rates at the primary school level, whereas there appears to be a positive relationship when looking at just secondary schools. This may be suggestive of a relationship between secondary schools providing a basic water service to students and students enrolling in schools, which I would hypothesize may have a particularly strong impact in developing countries.

Limitations

The main limitation in my analysis included missing pieces of data. Since data on water access in schools is collected much less frequently than household level water access data, there were some holes in my data analysis, that prevent me from drawing conclusions in certain regions. Specifically, there is insufficient data by the WHO in Eastern and South-Eastern Asia, as well as several countries that lacked recent data regarding their schools.

Advocacies about water access around the world (Masahiro)

Introduction

In this tab, we take a look at the the tweets advocating for greater access to water around the world in order to discover some interesting trends among those tweets. Specifically, in order to gather up the tweets, search_tweets() function was run on May 4th and May 7th, and the tweets generated roughly from April 27th to May 7th were recorded in the same dataset. The included tweets all include at least one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.” For more details, take a look at the “Wrangling - Masahiro” file in the same repo. Through exploring the following three questions with data, we aim to learn about what kind of rhetoric people are employing in an attempt to claim for more access to water around the world.

  • What are the common words used in the tweets requesting more access to clean water around the world?
  • What are the common sentiments of the words observed in those tweets?
  • What do those common words and sentiments imply about people’s rhetorics arguing for clean water in some of the regions lacking water access?

In addition to the removal of so-called stop words from the dataset, we also omitted the word “access” because it is obvious that all the tweets should include that word from the way we collected data. Doing so helps us produce more meaningful word clouds and sentiment analyses.

Word Cloud

First, we examine the word cloud addressing all the words except the ones displaced through data wrangling in order to get a sense about what are some of the most common words utilized in the focal tweets.

In the above word cloud, “https” stands out in its size, which implies that a lot of tweets related to water access advocacy refer to or cite other web resources. Also, “clean” is displayed largely in the visualization, which should be partly because “access to clean water” is one of the phrases we actively searched for when scraping tweets. However, given that we also looked for “access to drinking water” when gathering text while the word “drinking” does not have equally big size in the display as “clean,” it seems like that the word “clean” possesses a particularly great importance for arguments for greater water access across the earth. Paying attention to other words displayed with smaller sizes, it can be seen that the cloud includes a lot of words related to potential use of water or implication of access to water: “sanitation,” “healthcare,” “health,” “food,” and “hygiene.” Besides, one of the interesting words to be observed in the cloud is “india,” whose presence may be attributable to the socioeconomic standing of India as a country or the nation’s especially large population. Finally, we also found it intriguing that “covid” occupied its place in the above visialuzation because it suggested that tweets about water access were often associated with this pandemic, although there did not seem to be a lot of explicit or obvious connections between the infectious disease and water access.

Sentiment Analysis

Next, we dive into the sentiments reflected in the usage of English by those advocating for water access on twitter. We use the NRC lexicon for attaching sentimental implications for words observed in tweets, and visualize the common sentiment in the tweets with the following graph.

As can be seen, positive, trust, and joy are the most popular sentiments among the words included in the tweets. Negative follows those top three sentiments, and then, the least popular sentiments such as anger, anticipation, fear, and sadness occupy the subsequent places. With this bar chart, we verify that a lot of words employed in the analyzed tweets have some positive connotations, which not only refers to “positive” as a sentiment but also “trust” and “joy.” In order to learn more about the use of words detected as implying these sentiments, we have decided to utilize the comparison cloud (see the next tab).

Comparison Cloud

The below comparison cloud displays what words are commonly used in the text scraped from twitter while also having implications of “positive,” “trust,” or “joy.” Before diving into the detailed observations about the visualization itself, we lay out how the code below works. A comparison cloud enables users to accomplish two goals simultaneously: comparing the relative frequency of the use of certain words and classifying the most commonly used words into several categories based upon certain criteria. In order to craft a comparison cloud, however, it is necessary to transform the data into the form of matrix, whose column corresponds to certain categories (in this case, the sentiment) and whose row refers to each word by its name. In order to craft such a matrix, a lot of wrangling has been conducted to create a dataset whose row corresponds to words and column to each sentiment. If interested, analyze the commented code below.

# preliminary wranglings below
# first extract words with the connotations of interest
# tweet_sentiment = dataset used for sentiment analysis
pure_words <- tweets_sentiment %>%
  filter(sentiment == "positive" | sentiment == "trust" |
           sentiment == "joy") %>%
  # then collapse the rows so that each word only occupies a single row
  group_by(word) %>%
  summarize()
# now prepare the dataset to be joined with the dataset about the count of
# each word with the three focal sentiments
pure_words_copied <- pure_words %>%
  # let each word occupy three rows at the same time
  slice(rep(1:n(), each = 3)) %>%
  mutate(number = row_number()) %>%
  # list up all the sentiments of interest
  mutate(sentiment = case_when(number %% 3 == 1 ~ "positive",
                               number %% 3 == 2 ~ "trust",
                               number %% 3 == 0 ~ "joy")) %>%
  select(word, sentiment)
# the below dataset is about the count of each word with the three connotations
# of interest
comparison_words_prep <- tweets_sentiment %>%
  # extract those with the three sentiments of innterest
  filter(sentiment == "positive" | sentiment == "trust" |
           sentiment == "joy") %>%
  # and count the frequency
  group_by(word, sentiment) %>%
  summarize(N = n())
comparison_words_prep_2 <- pure_words_copied %>%
  # join the dataset with the data about the count (used for the bar)
  left_join(comparison_words_prep, by = c("word", "sentiment")) %>%
  # if some words do not imply certain sentiments, it will be reflected as 
  # N/A values, so turn it into 0
  mutate(count = case_when(is.na(N) ~ 0,
                           TRUE ~ as.numeric(N))) %>%
  select(word, sentiment, count)
# one last step to make each column refer to each sentiment
comparison_words_prep_3 <- comparison_words_prep_2 %>%
  spread(key = sentiment, value = count)
# the below code translates the data frame into a matrix, and each row name of
# the matrix should correspond to the word
comparison_words <- comparison_words_prep_3 %>%
  select(-word) %>%
  as.matrix()
rownames(comparison_words) <- comparison_words_prep_3$word

# create the comparison cloud
colors1 <- c("#48F11F", "#1226D2", "#CB0A3E")
colors2 <- c("#CCFF99", "#7F88EF", "#EF7FCA")
comparison.cloud(comparison_words, max.words = 100,
                 random.order = FALSE,
                 colors = colors1,
                 title.colors = colors1,
                 title.bg.colors = colors2)

As was the case in the first word cloud analysis, in this comparison cloud, too, “clean” stands out in its frequency of use as shown by its large size in the cloud. However, as a category, words classified into positive have more presence in the analyzed tweets as shown by the previous tab of bar chart, which means that the frequency of use of “clean” is not so big that it can dominate the text analysis conducted here by its extraordinarily large presence. Taking a closer look at the visualization above, we have noticed that the above display includes a lot of words related to potential outcomes caused by the greater access to water around the world: “food,” “healthy,” “save,” “green,” “income,” “medical,” “safe,” “luxury,” and “survive.” This finding somewhat resonates the insights gained in the original word cloud because both of the visualizations exhibit a lot of words associated with various promising implications of the access to water. Also, the above comparison cloud has let us notice that the tweets of interest contain a number of words related to the process of ensuring water access to underprivileged people: “advocate,” “guarantee,” “partnership,” “supporting,” “improving,” “conservation,” and “providing.” This suggests that the description of the necessary steps to secure water access around the world has made the tweets advocating for water access include a lot of words related to positive connotations, such as positiveness, trust, or joy.

Discussion

Throughout the exploration of the general word cloud, a bar chart, and a comparison cloud, this research has revealed that the tweets requesting greater access to water across the world incorporated a lot of words which connoted positiveness, joy, and trust, and that they specifically include a lot of words related to the potential outcomes of of access to water, such as “sanitation” or “food.” We believe that this may plausibly be attributable to the fact that a lot of tweets of interest here describe and discuss how securing water access can improve the life of people in developing country or what such water access enables. This explanation sounds convincing to some extent given that the comparison cloud has shown many words which can be associated with the process of improving water access, such as “partnership” or “donation.”

In other words, this study has revealed that the tweets arguing for water access around the world do not engage with negative words, such as death or disease, as much as they do with words with positive sentiments: “positive,” “joy,” “trust.” This implies that the tweets for advocacy of water access may talk more about how greater water access can resolve problems in the world by, for example, improving the sanitation, food access, and safety in some areas, rather than about how lack of water causes diseases, deaths, conflicts, or other sufferings on the earth. We find this speculation fairly plausible given all the results above, and also we find it intriguing that people describe more of the positive aspects of securing clean water around the world and less of the negative consequences caused by lack of water in discussing water access around the world.

However, we also acknowledge that these findings generated with word clouds do have limitations. The word cloud, bar chart, and comparison cloud here are all generated after cutting the tweets into words. In other words, we are not really analyzing the sentences, which is to say that we are not strictly distinguishing between the two following phrases: “today’s effort for greater water access can improve sanitation around the world,” and “today’s effort for greater water access does not improve sanitation around the world.” The two phrases include almost identical set of words, and moreover, since the negative connotation of the latter text is almost entirely due to the word “not,” which would have been removed as a stop word at the beginning of the data wrangling, our data analysis is not capable of distinguishing the sentiments between the two above phrases. Our findings indeed raise some common words among the tweets of interest, point to positive, joy, and trust as common sentiments, and reach potential explanations about people’s rhetoric which also resonate with what visualizations exhibit here. In short, as a blog project, we are confident that the text analysis using tweets have given substantial new perspectives upon people’s discourses for greater water access around the world. However, we also believe that we definitely need to exploit more techniques of text analysis to generate more accurate and meaningful findings, and future research may not only analyze these tweets as a set of words but also see them as a collection of bigrams or larger unit of English words in order to build upon and expand the discovery here.

Conclusion

  • Countries in Central/Sub-Sarahan Africa and South/Southeast Asia appear to have the highest percentage of deaths caused by unsafe drinking water in 2017.

  • There is a high correlation between countries that have few deaths and countries that have a high percentage of their population relying on at least basic service for drinking water. This means that most countries experiencing deaths due to unsafe drinking water have a significant portion of their population relying on either surface water, or an unimproved/limited service.

  • There is a global growth in water service coverage from 2000 to 2017.

  • Countries of higher economic development levels tend to have better water service coverage, and countries with very large income disparity tend to be those that rank in the middle in water service coverage.

  • There is an urban-rural gap in water service coverage worldwide, although this disparity is decreasing from 2000 to 2017. The scale of this inequality differs across regions, water service types, and time.

  • In schools, water access appears to be slighter better in secondary schools versus primary schools across all SDG regions.

  • In terms of the impact of water on enrollment rates, there appears to be a positive relationship between the proportion of schools with a basic water supply and secondary school enrollment rates when looking at just secondary schools

  • In terms of water advocacy, there is a trend to talk about what humans can enjoy with a greater access to water/how humans can realize that situation with a positive language, rather than focus on the detrimental impacts of poor water access.

Bibliography

  1. Arango, L. (2020, March). Action Against Hunger, Philippines [Photograph]. Action Against Hunger. https://actionagainsthunger.ca/world-water-day-access-to-clean-water-is-more-important-than-ever/

  2. DataBank. The World Bank Group. 2021. https://databank.worldbank.org/home.aspx

  3. Global Health Data Exchange. Institute for Health Metrics and Evaluation at the University of Washington. 2019. http://ghdx.healthdata.org/gbd-results-tool

  4. Mejía, Francisco. “How important is clean water for education?” Impacto, 20 Feb. 2014, blogs.iadb.org/efectividad-desarrollo/en/how-important-is-clean-water-for-education/.

  5. Roser, M. and Ortiz-Ospina, E. (2013) “Primary and Secondary Education”. Our World In Data. https://ourworldindata.org/primary-and-secondary-education

  6. SDG Indicators. United Nations. 2021. https://unstats.un.org/sdgs/indicators/regional-groups

  7. Water Supply, Sanitation and Hygiene (WASH) Household Data. WHO/UNICEF Joint Monitoring Programme (JMP). 2017. https://washdata.org/data/household#!/

  8. Water Supply, Sanitation and Hygiene (WASH) School Data. WHO/UNICEF Joint Monitoring Programme (JMP). 2019. https://washdata.org/data/school#!/